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ABSTRACT 

We identify 885,503 type 1 quasar candidates to i < 22 using the combina¬ 
tion of optical and mid-IR photometry. Optical photometry is taken from the 
Sloan Digital Sky Survey-Ill: Baryon Oscillation Spectroscopic Survey (SDSS- 
III/BOSS), while mid-IR photometry comes from a combination of data from 
the Wide-Field Infrared Survey Explorer (WISE) “ALLWISE” data release and 
several large-area Spitzer Space Telescope fields. Selection is based on a Bayesian 
kernel density algorithm with a training sample of 157,701 spectroscopically- 
confirmed type-1 quasars with both optical and mid-IR data. Of the quasar can¬ 
didates, 733,713 lack spectroscopic confirmation (and 305,623 are objects that 
we have not previously classified as photometric quasar candidates). These can¬ 
didates include 7874 objects targeted as high probability potential quasars with 
3.5 < z < 5 (of which 6779 are new photometric candidates). Our algorithm is 
more complete to z > 3.5 than the traditional mid-IR selection “wedges” and 
to 2.2 < z < 3.5 quasars than the SDSS-III/BOSS project. Number counts 
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and luminosity function analysis suggests that the resulting catalog is relatively 
complete to known quasars and is identifying new liigh-z quasars at z > 3. This 
catalog paves the way for luminosity-dependent clustering investigations of large 
numbers of faint, high-redshift quasars and for further machine learning quasar 
selection using Spitzer and WISE data combined with other large-area optical 
imaging surveys. 

Subject headings: catalogs — quasars: general — methods: statistical — infrared: 
galaxies 


Introduction 


Recent years have seen considerable growth in the number and density of known actively 
accreting supernrassive black holes in the form of active galactic nuclei (AGNs) and luminous 
q uasars. For example, X-ray surveys now reach AGN densities of more than 9000 deg -2 (e.g., 


Xue et al.ll20111 ). albeit over areas of -C 1 deg 2 . Spectroscopic f ollow-up of broad -band optical 
imaging from the Sloan Digital Sky Survey-I/II/III (SDSS; York et al. 2QQol) project has 


expanded the nu mber of confirmed quasars to over 270,000 objects (ISchneider et al.l 12010 


Paris et ah 20121 over roughly 1/4 of the sky. Mid-infrared (MIR) selection from WISE and 
Spitzer allows A GN selection (both unobscured and obscured) over the full sky to densities 
of over 60 deg -2 (Stern et al . 2012; Ass e f et ah 20131). Deep large- area o ptical surveys such 
as the Dark Energy Survey (DES; iThe Dark Energy Survey Collaboration! 120051) and the 
Large Synoptic Survey Telescope (LSST; Ivezic et al. 2008J) will considerably expand the 
number of known AGNs even in already well-mapped areas of sky, especially at liigh-z and 
for low-luminosity AGNs in compact galaxies. 

Our own work has sought to expand the ranks of known quasars by applying modern 
statistical techniques to optical imaging data instead of performing spectroscopy, increas- 
ing the number of known quasars to as many as 1,000,000 (Richards et al.l 200 4 2 009al : 
Bow et al.l 20111 and enabling simul taneou s multi-w avelength (optical plus MIR) selection 
using those same techniques (Richards et ah 2009b). Such catalogs have enabled investi- 
gatio ns not possible with the density of spec troscopic quasars, including cosmic magnifica¬ 


tion (Scr anton et ah 20051). quasar evolution (Myers et a 


Effect (IGiannantonio et al.l 120121) . gravitati onal lenses (IQguri et al.l 120061) . binary quasars 


2006) , the integr ated Sachs-Wolfe 


(Hennawi et al.ll2010l) . and dust i n galaxies (Menard et al.l 120101) particular l y with rigorous 


mitig ation of the systematics (e.g., Leistedt et al. 20131: Pullen fe Hiratall2013 : Leistedt fe Peiris 


2014) that are inherent to a photometric quasar sample. 
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The goal of this paper is to extend our previous work as follows: 1) By providing 
both optical and MIR data that can be used to help photometrically identify even larger 
samples of q uasar s. 2) Expanding our pilot optical+MIR quasar selection from ~ 24 deg 2 


m Richards et ah ( 2009b ) to over 10,000 deg 2 by combining optical data from the SDSS 
and MIR data from both WISE and Spitzer- IRAC. 3) Using these optical+MIR data to 
discover new 3.5 < z < 5.0 quasars—even in areas that have already received significant 
attention (e.g., COSMOS and Bootes). 4) Filling in the gaps of incomplete redshift from the 
optically-targeted SDSS-I/II/III spectroscopic sample. 5) Providing a discovery framework 
for clustering studies of liigh-z quasars within the upcoming Spitzer data within the area 
of SDSS Stripe 82 as part of the SpIES project (Timlin, Ross, Richards et al. 2015, in 
preparation). 

Section 2 begins with a compilation of over 270,000 spectroscopically confirmed quasars 
and over 1.5 million photometrically selected quasars in the SDSS footprint. These data 
are the basis of our training set for further quasar discovery and we provide this catalog in 
order to allow others to test their own quasar selection algorithms and to make meaningful 
comparison of them to ours by using the same data set. In our work, we enhance these 
data by matching between the SDSS-optical and the MIR from WISE and Spitzer , where we 
have made conversions to put all of the MIR data on the same photometric system. Here 
we emphasize the difference between our work ( which concentrates on finding new type 1 
quasars, particularly at high redshift) and that of Stern et al. (2 0121 ) and Assef et al. ( 2013 ) 
which were designed to find both type 1 and type 2 AGNs using rigid magnitude and color 
cuts to minimize contamination—at the expense of high-redshift quasars (Richards et al. 


2009bl; I Assef et al.ll2010h . 


In Section 3 we describe the construction of our optical+MIR training sets for dis¬ 
tinguishing quasars from stars and apply our selection algorithm to a test set of objects. 
Our primary focus is over 3.5 < z < 5 .0 where MIR-only selection is most incomplete 
(IRichards et al.l l2009bt I Assef et al.l l2010l; iMessias et al.l 120121) : however, we also perform a 
selection over 2.2 < z < 3.5 and 0 < z < 2.2 as our method can also improve upon optical- 
only selection which i s typi cally incomplete at z ~ 2.7 and z ~ 3.5 (IRichards et al 2006 


Worseck & Prochaska 2 0111 ) and reveals lower-luminosity AGNs at z < 2.2 that optical se¬ 
lection alone may fail to distinguish from compact galaxies. 


In Section [4] we present our catalog, including photometric redshifts. Finally in Section [5] 
we make comparisons to previous work, finding that our method allows us to discover many 
quasars in hard-to-reach redshift ranges when using either optical-only or MIR-only selection. 
Our 3.5 < z < 5 targets are particularly important for constraining AGN feed back mod¬ 
els by examining the luminosity-dependence of high-redshift quasar clustering (Lidz et ah 
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20061 ). where current samples lack sufficient high-redshift objects over a significant range 
in luminosity. We have an insufficient combination of depth and areal coverage to perform 
this analysis with the current sample; however, such analysis can be performed with Spitzer- 
1RAC observations of SDSS “Stripe 82” over ~ 110 deg -2 to a depth of ~ 6 /iJy (Timlin, 
Ross, Richards et al. 2015, in preparation). Section [5] concludes with a number counts and 
luminosity function analysis of the catalog and a discussion of future work. 

We report photometry primarily in AB magnitudes, where Spitzer-IRAC Channels 1-2 
are given by [3.6] and [4.5], which are the nominal wavelengths of the bandpasses in microns. 
For comparison with other work using Vega magnitudes we note that the conversions between 
Spitzer-IRAC AB and Vega ([Vega] — [AB]) are 2.788, 3.255, 3.743 and 4.372 mag, respec- 
tiveljlE For example [3.6] — [4.5] (Vega) = [3.6] — [4.5] (AB) + 0.467. For WISE, we adopt 2.699 
and 3.339 as the conversions to AB from W\ and 117 Vega magnitudes, respectively^, where 
the WISE central wavelengths are 3.4, 4.6, 12, and 22 pm for W\ , 117, Ws and IT 4 , respec¬ 
tively. Cosmology-dependent parameters are determined assuming H 0 = 70kins" 1 2 Mpc ^ 1 

= 0.3 and 12 a = 0.7, in general agreement with WMAP results (e.g., 

2013,h . 


Hinshaw et al. 


2. The Data 

To conduct our analysis we require optical imaging data of a sample of objects that 
require classification; such data will constitute our test set. Some subset of those data must 
have already been spectroscopically classified (as quasars) and will form the basis of our 
quasar training set. These training and test sets will be described more fully in Section IHTT1 
Here we describe the origin of the data and the parameters determined from the data that 
are used for classification by our algorithm. Section 12.11 presents the known quasar sample 
used to build the training set, Section 12.21 describes the optical data, Section 12.31 discusses 
the infrared data, while Section l2~4l explores the redshift, magnitude, and color distributions 
of the matched optical-infrared data. 


1 http://irsa.ipac.caltech.edu/data/COSMOS/gator_docs/scosmos Jr ac_colDescriptions.html 

2 http: / /wise2. ipac. caltech.edu/docs/release / allsky/expsup / sec4_4h. ht ml 
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2.1. Master Catalog of Quasars with SDSS Photometry 


In order to optimally select new quasars, we need the largest possible database of extant 
quasars with which one can build training sets. We construct such a catalog b y gat hering; 


samples of spectrosco pically-confirmed quasars within the SDSS-I/II/III flYork et al.l 2000 


Eisenstein et a]J 2011) footprint. Here we detail the input catalogs and the process used to 
combine them. We will refer to this catalog throughout the paper as the “master quasar 
catalog”. 

We started with the ha nd-vetted quasa r cat alog that concluded the SDSS-I/II project. 
Specifically, Tab 


e 5 from Schnei der et ah (120100 . where we have used the redshifts from 
Hewett & Wild (l2Q10i) where available. The other large fraction of spectroscopic quasars 


comes from the the Sloan Digital Sky Survey-Ill: Baryon Oscillation Spectroscopic Sur¬ 
vey ( SDSS-III/BOSS) project (Dawson et ah 2013|), specifically those quasars cataloged by 
Paris et al.l (2 0141 ) as part of “Data Release 10”, where we used the “visual inspection” 
redshifts. 

In addition to the standard BOSS quasars, we include a sample of 851 q ua sars identified 
on dates between late 2008 and early 2009 using Hectospec (jF ab ricant et ah 2 00 5) on the 
MMT. The original purpose of this “MMT” quasar sample was to investigate the faint end 
of the quasar luminosity function in preparation for BOSS, and quasars were targeted using 
deep optical data in Stripe 82 and MIR data from Spitze r w here available. More details of 
these MMT quasars are provided in Appendix C of [Ross et ah ( 2012ai ). We include all of 


these MMT quasars, instead of just those that were located in Stripe 82, which expands the 
sample compared to the 444 quasars listed in Tables 14 and 15 of Ross et ah (2012a) 

)i The 2dF 


Next we add the full quasar catalog from the 2QZ project ((’room et al.l 2004 


instru ment provides another catalog input, namely that from the 2SLAQ project (ICroom et al 


2009)3 where we have included only objects labeled as any type of “QSO”. The 2dF instru¬ 
ment has since been upgraded to the AAOmega instrument which was used to observe objects 
in our third catalog from the Anglo-Australian Telescope. Specifically, we include objects 
from the AUS project (Croom et ah in preparation), including both a A'-band limited 
sample and a z > 2.8 selected sample. 


We next incorporate quasar data from the AGES project (Kochanek et ah 2012), specif¬ 
ically using data from their Tables 5, 6, and 7. We have excluded low-luminosity AGNs by 


3 www. 2dfquasar.org/Spec_Cat/cat/2QZ_6QZ_pubcat.txt 
4 www. 2slaq. info / 2slaq_qso / 2slaq_qso_public. cat 
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requiring qso = 1 from Table 5. Quasars from another deep, wide area, namely COSMOS 
( Scoville et al. 2007b ] § have also been included in our sample, where the data were limited 
to type 1 objects ( Lilly et al. 2007 : Trump et al. 2009 1. 


To incre ase the number o f rar e, very high - redsh ift quasars, we also include 65 z > 5.8 
quasars from iFan et al.l (1200611 and Ijiang et al.l (1200811 . The master q uasar catalog w a s buil t 
before a large number of z ~ 5 quasars were cataloged in Stripe 82 by McGreer et ah (120131 ). 
but we recover 49 of the 65 that are bright enough to have matching MIR photometry. 


Our master quasar catalog is rounded out by a few smaller samples of objects meant 
to extend the range of propertie s covered. This includes the “BRO ADLINE” objec t s from 
Table 5 of iPapovich et al.l (120061) , the z y 4 quasars fr o m Ta ble 5 of I Cl i km an et al.l (120101) , 


and ALA-selected quasars at z > 1 from Maddox et ah (120121 Tables 4 and 6) 


There may yet be some known type 1 quasars within the SDSS footprint that we have 
not included in our master quasar catalog; however, they should mostly be small samples of 
objects that are already represented or much brighter than the SDSS flux limits (e.g., 3C273 
and most “PG” quasars from S ch midt & Green 1 983 *). 


All of the above objects are spectroscopically confirmed quasars; however, many more 
likely quasars have been identified photometrically. As that information also has value in 
considering identificatio n of new quasars, we h ave included obj ects listed in the p hotometric 
quasar catalogs of both Richards et ah ( 2009a . NBCKDE) and Bow et ah ( 2011 . XDQSO). 


These individual tables are merged together and a flag is set to indicate the origin. The 
flag values run from 0 to 13 as follows, where spectroscopic redshifts from earlier catalogs in 
the list trump later catalogs when there is a duplication: SDSS, 2QZ, 2SLAQ, AUS, AGES, 
COSMOS, FAN, BOSS, MMT, NBCKDE, XDQSOZ, PAPOVICH, GLIKMAN, MADDOX. 


For the benefit of those wishing to make use of this master catalog we make it available 
in Table HI The columns are as follows: 1) RA (degrees), 2) Dec (degrees), 3-7) SDSS run, 
rerun, camcol, held, and ic@, 8) the SDSS morphology (0BJC_TYPE), 9-10) code indicating 
SDSS data quality (0BJCLFLAGS and 0BJCLFLAGS2), 11) SDSS Galactic EXTINCTION in all 5 
bands, 12) the SDSS flux as measured from Point-Spread-Function fitting (in nanomaggies) 
in all 5 bands, 13) the inverse variance of the PSF flux in all 5 bands, 14) the co-added SDSS 
PSF flux for those objects observed in multiple epochs, 15) the inverse variance for column 


5 irsa.ipac.caltech.edu/data/COSMOS/tables/spectra/ 

6 These and other SDSS-related information are describe in more detail at 
https:/ / www. sdss3 .org /dr 9 / imaging / imaging_basics. php. 
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14, 16) PSF_CLEAN_NUSE is an indication of whether there are multiple epochs of imaging 
data (values larger than 1 indicate that we have used the “CLEAN” [i.e., co-added] values 
of the PSF flux in our analysis), 17) ZBEST indicates the redshift determined from each of 
the sources of data described in 18) SOURCEBIT (numbered 0-13 in the order given above), 
19) indic ates w hether the SDSS object fell in the “uniform” selection area as described by 


Ri chards e t ah (120021 1. 20-21) codes from the AGES survey that we used to reject low-redshift 


AGNs from our training set, 22-25) photometric redshift information from the NBCKDE 
photometric quasar catalog (Richards et ahll2009a[) , 26-28) photometric redshift information 
from the XDQSO photometric quasar catalog (Bovy et al. 2011). 











Tabic 1. Master Quasar Catalog 


Column 

Name 

Description 

i 

RA 

Right Ascension (J2000) 

2 

DEC 

Declination (J2000) 

3 

RUN 

SDSS run number, see http://classic.sdss.org/dr7/glossary/index.html 

4 

RERUN 

SDSS rerun number 

5 

CAMCOL 

SDSS camera column 

6 

FIELD 

SDSS field number 

7 

ID 

SDSS ID number (within the field) 

8 

OBJC.TYPE 

SDSS object type (stellar = 3, extended^ 6) 

9 

OBJC.FLAGS 

SDSS object flags, see http://classic.sdss.org/dr7/products/catalogs/flags.html 

10 

OBJC_FLAGS2 

SDSS object flags 

11 

EXTINCTION 

Magnitudes of Galactic extinction in ugriz 

12 

PSFFLUX 

Point-spread-function flux in ugriz 

13 

PSFFLUX.IVAR 

Inverse variance of point-spread-function flux in ugriz 

14 

PSFFLUX.CLEAN 

Co-added point-spread-function flux in ugriz 

15 

PSFFLUX.CLEAN.IVAR 

Inverse variance of co-added point-spread-function flux in ugriz 

16 

PSF_CLEAN_NUSE 

Flag indicating whether co-added (CLEAN) flux should be used 

17 

ZBEST 

Spectroscopic and photometric redshifts from the sources indicated by SOURCEBIT 

18 

19 

SOURCEBIT 

SDSS.UNIFORM 

Bitwise flag from 2° to 2 13 indicating the redshift source as coming from SDSS, 2QZ, AUS, AGES, 
COSMOS, FAN, BOSS, MMT, NBCKDE, XDQSOZ, PAPOVICH, GLIKMAN, MADDOX, respectively 
Indicates whether the SDSS object fell in the “uniform” selection area, see Richards et al. (2002) 

20 

AGES.QSO 

AGES flag, see Kochanek et al. (2012) 

21 

AGES.CODE06 

AGES flag, see Kochanek et al. (2012) 

22 

KDE.ZPHOTLO 

Minimum photometric redshift from Richards et al. (2009) 

23 

KDE.ZPHOTHI 

Maximum photometric redshift from Richards et al. (2009) 

24 

KDE.ZPHOTPROB 

Photometric redshift probability from Richards et al. (2009) 

25 

KDE.LOWZORUVX 

Flag indicating a UV-excess or low-redshift source; Richards et al. (2009) 

26 

XDQSOZ.PEAKPROB 

Peak of the redshift probability from Bovy et al. (2011) 

27 

XDQSOZ.PEAKFWHM 

FWHM of the redshift peak from Bovy et al. (2011) 

28 

XDQSOZ.NPEAKS 

Number of peaks in the Bovy et al. (2011) photo-z distribution 



9 


2.2. Optical Data 


Over more than 10 years, the SPS S used a sophistic ated telescope flGunn et al.l 120061) 
fitted with a large hcld-of-view camera flGunn et al. 1998) to take exposures through ugriz 
filters (IFukugita et al. 1996). For the training and testing sets in this paper, we use the 
“Data Release 9” (DR9) versions of this SPSS imaging (Ahn et ah 2012 1. DR9 included the 
latest astrometric and photometric calibrations for imaging in the original northern SPSS 


footprint and in the southern footprint that was added as part of SPSS DR8 (lAihara et al. 
2 011 ). Specifically, we use the versions of the SPSS imagi ng p rovided in the calibObj or 
“data sweep” fileqZl that are discussed in iBlanton et al.l (120051) . We limit the data sweeps 


to only objects that are PRIMARY in SPSS imaging (e.g., see Table 5 of Sto u ghton et ah 


20021 ). but do not further restrict our optical sources using cuts on image quality flags at 
this stage (any additional flag cuts are described in the relevant sections of this paper). We 
use such PRIMARY sources from the SPSS data sweep hies as our test data and also match 
our heterogeneous master training catalog of quasars (described in the previous section) to 
PRIMARY objects from these data sweeps. 

While the spectroscopic identifications that we tabulate have a heterogeneous origin, 
one advantage of the catalog of quasars that we have built is th at their optical imaging is 
derived solely from the SPSS imaging camera (IGnnn et ahlll998l ). providing a homogeneous 
aspect to the data set. 

All of the opti cal magnitudes are reported in the catalog are asinh PS F magnitudes _ 

( Lupton et al.l 1999 i corrected for dust extinction using the coefficients given by Schlahv &; Finkbeiner 
(2 0111) . Fluxes are reported in nanomaggies without any dust extinction correction. The full 
list of cataloged parameters are given in Table |T] for the master quasar catalog and Section 0 
for our quasar candidate catalog; further information on each source is publicly available. 


2.3. Infrared Data 


To create our MIR d ata se t, we begin by merging large areas of relatively deep S pitzer - 
IRAC data (Fazio et ah 120041) with shallower, but wider-area WISE data (1 Wright et ah 
201 01). This has the advantage of allowing us to probe both a wide area and relatively 
deep (in a fraction of that area). 


7 http://data.sdss3.org/datamodel/files/PHOT023WEEP/RERUN/calibObj.html 
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The WISE data come from the ALLWISE data releas^fl, where we have kept only objects 
with both W\ and W 2 detections and have excluded objects that do not meet the following 
quality control criteria: wlflg <= 1 && w2flg<= 1 (to avoid sources with bad pixels or 
that are upper limits), cc_flags= =, 000(P (to avoid objects affected by diffraction spikes, 
ghosts, latent images, and scattered light), ext_f lg==0 (to limit to MIR point sources), and 
wlsnr> 2 && w2snr> 2 (to limit to objects that are well-detected in both W\ and By 

matching known SDSS quasars to ALLWISE, we estimate that these cuts cull 9.6%, 3.0%, 
0.6%, and 0.2% of real sources, respectively. This incompleteness is corrected in our number 
counts and luminosity function analysis in Section [5] 


The Syitzer catal ogs include 1) the SWI RE data (ILonsdale et al.l 120031) . 2) the XFLS 
data f Lacv et al. 20051 ) . 3) the COSMOS data ( Sanders et al. 2007a ). 4) our own pilot sample 
of Spitzer-IRAC data centered on known high-z quasars in SDSS Stripe 82 (data tabula ted in 
Krawczvk et al. 2013), 5) the SDWFS d ata in the Bootes field ( Eisenhardt et al. 2004 ). and 
6 ) the SERVS data dMauduit et al.ll2012h The SW IRE, XFLS, COSMOS, and SDWFS data 


are the same data used in 


Richards et al. (l2009bh : see that paper for more details. Bobtes 


data are taken from Ashby et al. (120091 ). specifically SDWFS_chl_stack.v34.txt, adopting 
the aperture-corrected 4" (diameter) flux densities. This catalog corresponds to a depth 
of 12 x30s and we have limited to objects detected in both Channels 1 and 2 and with 
SExtractor flags of 0 or 2. Vega magnitudes have been converted to /iJy. The SERVS data 
are described in detail in Mauduit, et ah ( 2012 ). 


Our Stripe 82 dat a includes pointed observations of oy er 300 known z > 2 quasars in the 
SDSS Stripe 82 field (lAnnis et al.ll20l4 IJiang et al.l 120141 ) and were processed in a manner 
similar to that which was used for the SWIRE data set. Photometry for these sources is 
tabulated in IKrawczvk et al.l ( 20131 ). We report fluxes in a 1'.'9 aperture radius. 


For all of the ab o ve data sets, we have included all objects that are not flagged by 
SExtractor (IBertin & Arnouts 1996) as blended in either IRAC Channel 1 or Channel 2 and 
we have applied no explicit flux limits to the individual catalogs. Flux densities have been 
converted to /rJy if the original data have other units. We report errors that have been 
increased by 3% (10% for XFLS) in quadrature since SE xtrac tor only reports the RMS at 
the image position; this is consistent with Donley et al. (2012, Section 4). 


We would like to be able to use MIR measurements from both WISE and Spitzer, 


8 http: / /■wise2. ipac. caltech.edu/docs/release / all wise / 

9 See http://wise2.ipac.caltech.edu/docs/release/allwise/expsup/sec2_la.html for detailed explanation of 
these parameters. 
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however, photometry from these two spacecraft are on different photometric systems. There 
is strong similarity in the two shortest wavelength filters of the systems, but a correction 
needs to be applied. As such, the WISE data have been converted from Vega magnitudes 
on the WISE system to /i.Jy in the Spitzer -IRAC system using color terms appropriate for 
each of the individual objects (based on their W\ — W 2 colors). This process is important 
for allowing us to treat the WISE and Spitzer data equivalently. As the II 3 and W4 data are 
much shallower than W\ and W 2 , we have only tabulated the W\ and W 2 photometry and 
we have only kept objects with detections in both of those bands. 


As an illustration of our conversion of the WISE Vega syst em to Spitzer AB, we convert 
the W\ — W 2 (Vega) = 0.8 color-cut used by Stern et ah (20.12) to the Spitzer AB system. 
First we find that 


W\ (Vega) - W 2 (Vega) = (Wi(AB) - 2.699) - (W 2 (AB) - 3.339) 


( 1 ) 


so that the above cut is W\ — W 2 {AB) = W± — W 2 ( Vega) — 0.64 = 0.16. We have then created 
a look-up table for the conversion of WISE AB magnitudes to Spitzer AB magnitudes as 
a function of color (assuming a p ower-law SED). In general these corrections are small 
for Wi and W 2 ; see Wright et al. ( 2010 . Tablel). We find that at W\ — IT^AB) = 0.16: 
[3.6] = Wi(AB)—0.028 and [4.5] = kF 2 (AB)-l-0.013, so that W\— W^Vega) = 0.8 is equivalent 
to [3.6] — [4.5](AB) = 0.119. Similarly we can convert a W 2 (Vega) = 15.05 magnitude cut 
(at this color) to Spitzer AB as follows: [4.5](AB) = W 2 (Vega) + 3.339 + 0.013 = 18.402. 
We illustrate these cuts in Section 13.11 where for the sake of simplicity we have ignored 
the color-dependence of the magnitude limit. As the agreement with Spitzer photometry 
has significantly improved for the ALLWISE data release as compared to the older, All-Sky 
WISE data, we have not further corrected for the remaining offsets. The typical ALLWISE 
limits are 54/iJy in W\ or 16.9 in Vega mags and 71 ply in W 2 or 15.9 in Vega mags, but 
depend on location due to WISE’s polar orbit. In AB mags, these limits are 19.6 and 19.3. 
See the ALLWISE Explanatory Supplemental for a discussion of how the Spitzer and WISE 
differences, [3.6] — W\ and [4.5] — W 2 , behave as a function of magnitude and for information 
on how the WISE sensitivity changes with coordinate. 


We generate a single merged MIR catalog by matching the above data sets using a 
2" matching radius with priority being given to objects from the individual catalogs as 
follows: SERVS, SWIRE, COSMOS, SDWFS, XFLS, Stripe82, and WISE. That is a SERVS 
detection will overwrite a SWIRE detection. Only one Spitzer detection of each object was 
allowed and a flag was set to indicate which catalog the photometry comes from. However, 
if there is data from both WISE and Spitzer , we have also kept the WISE data for reference. 


10 http://wise2. ipac.caltech.edu/docs/release/allwise/expsup/sec2 J3a.html 
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This final MIR catalog is then matched to the SDSS-III imaging data using a 2" matching 
radius. No explicit flux limits have been applied. Dust extinction has been corrected as 


-A[ 3 . 6 ] = 0.197 E(B — V ) and 7t[ 4 5 ] = 0.18077(5 — V), consistent with Cardelli et al. fll989i) 
as reported by the NASA/IPAC Infrared Science Archival. 


The full SDSS-III footprint lacks deep near-IR imaging, since 2MASS flSkrutskie et ah 


19971 ) is too faint to provide counterparts for the bulk of our quasar sample. However, when 
available, near-IR data is very useful for improving photometric redshift (photo-z) estimates. 
Thus, while we do not use near-IR data for our quasar selection algorithm, we do match our 
optical catalog to near-IR catalogs from the regions of sky covered by the UKIRT Infrared 
Deep Sky Survey (UKIDSS; Lawrence et al. 2007) and the Vista Hemisphere Survey (VHS; 
McMahon] 12012). We used a matching radius of 1" and included only objects that have 
measurements in each of J, H, and K. While these near-IR data are not simultaneous 
with the optical or MIR data, which causes some scatter in the color distributions, even 
simultaneous observed-frame multi-wavelength (and thus multi-distance scale) data would 
not be simultaneous in the rest-frame. 

Figure [Q shows the relative limits of the MIR and near-IR data as c ompared to the 
optical for a typical quasar spectral energy distribution (Krawczyk et al. 2013). High -z 
quasars found from SDSS photometry with i < 20 are expected to be detected in ALLWISE. 
They should also be detected by UKIDSS and would be detected by GALEX in the bluest 
bandpass. Quasars closer to the SDSS photometric limit (for single-epoch data) can be much 
fainter than the ALLWISE, UKIDSS, and VHS limits, which will limit the completeness of 
this catalog. Fainter quasar candidates are limited by the depth of ALLWISE (or the area 
of Spitzer). 


2.4. Diagnostics 

Here we provide some diagnostic plots to illustrate the range of optical and MIR prop¬ 
erties spanned by our choice of data. Figure [2] shows the redshift distribution for all of the 
objects in our master quasar catalog, including those objects where only optical photometry 
is available and those objects where MIR photometry exists. The peaks in redshift in this 
figure represent selection effects. The SDSS DR7 quasar sample peaked at z ~ 1.5, while 
the SDSS DR10 quasar selection was optimized for z ~ 2.5, with contamination coming at 
z ~ 0.8. Most of the losses of IR-matched objects at low redshift are due to the flag cuts 
imposed upon the WISE data. At high redshift, the difference between the focus of our work 


11 irsa.ipac.caltech.edu 
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Fig. 1.— Relative limits of the multi-wavelength data. The bars indicate the effective 
wavelength of the bandpasses, but are not scaled to represent the size of the bandpass. 
Red indicates MIR data from ALLWISE and Spitze r-SWIRE, green indicates the limits of 
UKIDSS and VHS, blue shows the depth of both single-epoch and multi-epoch (Stripe 82) 
SDSS photometry, while cyan gives the limits of the GALEX AIS survey. Two example 
quasars spectral energy distributions (from K rawcz vk et_aT 20 13) are given for z = 2 (solid 
black line) and z — 4 (dashed black line), both corrected for Lyman series extinction and 
normalized to i — 20, which is roughly the limit of SDSS spectroscopy for high-redshift (it 
is i — 19.1 for low redshift, which is shown in gray). 


(not relying on MIR color cuts) and that of Assef et al. ( 2013h (which utilizes MIR color 
cuts) is readily apparent. 


Figure [3] shows the magnitude distribution of the objects in the master catalog. The 
peaks in the distribution are caused by a combination of magnitude limits: the SDSS DR7 
quasar sample had a z < 3 magnitude limit of i < 19.1 and a z > 3 limit of i < 20.2, 
while SDSS DR10 probed to g < 21.85 (i ~ 22). Although adding MIR photometry is very 
powerful for AGN selection, it is also responsible for reducing the completeness to known 
quasars by a factor of ~2 by i — 20. Up to i ~ 19, over 80% of our quasar sample includes 
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Fig. 2.— Redshift distribution of the full spectroscopic quasar sample ( solid line ; 274,329 
quasars), for the IR-matched sample ( dashed line ; 157,701 quasars—the parent sample of 
our q uasar training sets), and for the IR-matched sample with the 75% reliability limit from 
Assef et ah (2013) imposed ( dotted line). Beyond redshift 4.5 the distributions have been 


scaled by a factor of 10 to better show the high-z part of the samples. The inset gives the 
ratio of the dashed line to the solid line and the dotted line to the solid line. Losses at low 
redshift are dominated by flag cuts (~ 13%, independent of redshift). Further losses at high 
redshift are primarily due to implicit ( dashed line) or explicit ( dotted line) magnitude limits 
of the sub-samples as can be seen in Figure [3] 


IR measurements from WISE or Spitzer. Most of the losses at bright magnitudes occur due 
to our attempts to restrict ourselves to the highest quality WISE data as noted above. The 
fraction of bright quasars with IR matches is roughly consistent with the expected loss of 
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~ 13% of sources due to the flag cuts on the WISE data and the fraction found by Wu et ah 
(2012). That is, the curves in the insets of Figures [2] and [3] should be shifted up by 0.13 
to correct for objects removed due to flag cuts. The dotted lines show the effect of the 


Assef et al. (120131 ) reliability cuts relative to the objects in our training set (dashed lines). 


Figure |4] shows the quasar colors as a function of redshift. In addition to the data points, 
we also depict the mean colors as a function of redshift for both the full optical sample and the 
more limited optical+MIR sample. Overall, there is good agreement between the samples. 


3. Classification 


In Section [2] we tabulated quasars both with and without MIR photometry; for the 
remainder of this paper we will consider only the optical+MIR data set. After building 
training and test sets (Section 13. Hi in a similar manner to that described in Richa rds et ah 
(I2009bn . we will apply the same Bayesian selection algorithm ISection l3.2j) described in our 
previous papers, and then we will describe the selection results ISection l3.3p . 


3.1. Training and Test Sets 


Starting with the matched optical+MIR photometry (both for known quasars and all 
SDSS-DR10 sources), we create the test set (objects to be classified) along with the quasar 
and non-quasar (“star”) training sets as follows. 


We first restrict the data to objects that are expected to have “clean” photometry, 
which, for our purposes, we define based on whether or not they have any of the following 
SDSS imaging quality flags set: INTERP_PROBLEMS, DEBLEND_PROBLEMS, N0T_BINNED1, EDGE, 
BRIGHT, SATUR, MOVED, BLENDED. NODEBLEND, and NOPROFILE. These flags are fully defined 
in Table 9 of Istoughton et al.l rt2002ll except for INTERP-PROBLEMS, DEBLEND_PROBLEMS and 


W A 

MOV ED which are detaile d in lRichards et al.l (120021) and/or are further discussed in Appendix 
A of jRoss et al. ( 2012al ). Objects must also have flux values of < 1000 nanomaggies (tuab > 
15) in all bands to be included as brighter fluxes can lead to saturated pixels. However, 
we have made this cut before applying any dust extinction corrections, so objects that are 
intrinsically brighter than jtiab = 15, but that are not saturated in the images are kept. 


If good co-added (multi-epoch) photometry is reported in all bands (as indicated by 
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i/[4.5] 


Fig. 3.— i-band magnitude distribution of the full spectroscopic quasar sample ( solid black 
line), for the IR-matched sample ( dashed black line —the parent sample of our quasar train- 


ing sets') and for the IR-matched sample with the 75% reliability limit from Assef et ah 


(120131 ) imposed ( dotted black line). The inset shows the ratio of the latter two samples to 
the full sample, demonstrating that our matching to WISE (and/or Spitzer) photometry is 
over 80% complete to i ~ 19 ( dashed line) and that our greater sensitivity to high-redshift 
quasars relative to Assef et al. ( dotted line 201,j) is largely due to probing deeper. The gray 
histograms in the main panel show the distribution in [4.5] for our full training set (solid) 
and after imposing the 75% reliability cut of Assef e t al . (120131 ) (dotted). 


PSF_CLEAN_NUSE> 


o f^l . then we retain the co-added fluxes (and errors); otherwise the single- 


12 Again see http://data.sdss3.org/datamodel/files/PHOTOAWEEP/RERUN/calibObj.html for descrip- 
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Fig. 4.— Color vs. redshift for the spectroscopic quasar sample. Black (linear) contours 
and dots show the color distributions of the individual objects. The top four panels include 
all of the spectroscopic objects; the bottom two panels contain only those matching to the 
IR sample. Lines give the mean color-redshift relations (which are used to compute the 
photometric redshifts). The red line is for all of the optical data, while the gray line shows 
the mean for the objects that additionally have IR matches. In the top four panels there 
is good agreement between the red and gray lines (and thus between the quasars with and 
without matching IR photometry). 


epoch fluxes are used. T o handle the problem of negative fluxes we have used the asinli 
magnitude prescription of lLupton et ah| ( 1999 ). 


tions of the format of the data sweeps files that we use. 
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Initially our classification included both point and extended (optical) sources as have 
our previous catalogs. Later we will restrict our analysis to just the point sources. At this 
point, the test set consists of all the photometry from all of the “good” point and extended 
sources described above. No further restrictions are placed on the objects that we attempt 
to classify. The classification parameters are the set of adjacent colors determined from each 
of the 5 optical and 2 mid-IR magnitudes that we catalog, specifically: u — g, g — r, r — i, 
i — z, z — [3.6], and [3.6] — [4.5]. In all there were 50,225,630 objects in the test set. 

The quasar training set is the subset of the test set for which there is a match in the 
master quasar catalog with a spectroscopic redshift (i.e., we have not included photometric 
quasars) as noted in Section 12.11 The “stars” training set is again a subset of the test 
set. Here sources matched to known (spectroscopic) quasars are excluded. The final stars 
training set is a randomly selected sample of ~700,000 objects (taking those objects where 
the hundredths and thousandths digits of the IRAC CH2 flux density were “01”). The vast 
majority of these objects lack spectroscopic classification as stars, thus these are not only 
stars, but can be (compact) galaxies (and previously unidentified quasars); see the discussion 
of the cleaning process below. Thus “stars” in this context is shorthand meaning optical 
point sources that have not been classified as quasars in the redshift range we are trying to 
select. 


In practice we have actually made three pairs of quasar and star training sets as quasar 
colors change considerably at high redshift and it is best to treat them as separate popula¬ 
tions. Thus, the quasar training sets are created by parsing through the quasars and keeping 
only those within the redshift range of interest. Quasars outside of that redshift range are 
put into the “stars” training set. The three ranges used are 0 < z < 2.25 (11984 quasars), 
2.15 < z < 3.55 (45561 quasars), and 3.45 < z < 5.5 (3321 quasars), where the overlap is 
to minimize the loss of objects near the redshift boundaries and we s top at z = 5.5 since 
selecting higher redshifts generally requires additional care ( Fan et ah 2006). We will refer 
to objects selected from the use of training sets focusing on these redshift ranges as “low- 0 ”, 
“rriid- 0 ”, and “high- 0 ” throughout the rest of the paper. 

Figure 0 presents the optical colors (and a magnitude) of the objects in our training 
sets. For the star training set, we show only the low -0 training set which includes quasars 
above 0 = 2.2. All three quasar training sets are shown. Similarly, Figure [6] gives the MIR 
colors of the training set objects. Here we include the color-magnitude cuts (solid black line) 
used by IStern et al.l (120121) to select their quasar sample in addition to the (s omewhat more 
inclusive) 75% reliability selection (solid yellow curve ) of Assef et al.l (120130 . Comparison 
of these curves to the distribution of high-redshift quasars illustrates their bias against such 
objects as shown in Section 12.41 This reflects a conscious choice to be sensitive to both type 
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1 and type 2 AGNs without significant contamination from inactive galaxies. Our approach 
is complementary in that we will endeavor to be as complete as possible to high-redshift type 
1 quasars, at the expense of type 2 quasars. The green lines in Figure [U] depict the cuts that 
we will use to reduce stellar contamination from the test sets as shown in Section 13.31 We 
duplicate them here to emphasize that they would throw out relatively few of the training 
set objects. 




Fig. 5.— Optical colors of training set objects. Point sources are in red, extended sources in 
gray, high -z quasars in black, mid -z quasars in cyan, and low-z quasars in blue. Extended 
sources are not explicitly included in the training set but are shown here for reference given 
that separation of point and extended sources is not perfect (particularly at faint magni¬ 
tudes) . 
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Fig. 6.— MIR colors of training set objects. Point sources are in red, extended sources in 
gray, high-z quasars in black, rnid-z quasars in cyan, and low-z quasars in blue. The dashed 
black line shows the detection limit as a function of color for a theoretical ob ject with 


[3.6]=20.5. The solid black lines indicate the color and magnitude limits of the Stern et al. 


( 20121 ) selection in AB magnitude space, while the yellow curve gives the 75% reliability 
selection from Assef et al. 020131 ). The green lines in the bottom panels give our own cuts 
that are intended to reduce stellar contamination; these are not applied to the training sets, 
but are shown here for comparison to the test set output. 


3.2. Application of the Algorithm 


As described in more detail in R i chards et ah (120041 . 2009a .b). our algorithm requires 
that we compute a “bandwidth” that best describes the range of colors of each object class. 
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This is akin to determining the best bin size to represent one’s data in a histogram: too 
many bins leaves too few objects in each bin, while too few bins over-smooths the data 
and causes a loss of information. Thus, the bandwidth is essentially a smoothing parameter 
for the color distributions. These bandwidths are determined by a self-classification step, 
choosing the bandwidth that yields the most complete recovery of known quasars with the 
smallest contamination from stars. As in our previous work, we first perform an initial self¬ 
classification of the training sets, then we throw out objects initially classified as quasars 
from the star training set (since we expect our star sample to be contaminated by those very 
objects that we wish to recover where other algorithms have failed). The final bandwidth 
is determined from the original quasar training set and the “cleaned” star training set. An 
example “heat map” showing the minimization of the bandwidths for self-classification of 
stars and quasars in the high -2 training sets is shown in Figure [D Optimal bandwidths were 
computed for each of the low- 0 , mid- 0 , and high -0 training sets. 



Quasar Bandwidth 

Fig. 7.— Graphical depiction of the search for optimal bandwidths for the star and quasar 
training sets. The color bar indicates the “rating” of each bandwidth pair, which is deter¬ 
mined by the product of the self completeness and efficiency. The optimal bandwidth in this 
case (the high -0 training set) was found to be (0.23,0.18) for (quasars,stars). 


The only other free parameter in our classification is the Bayesian stellar prior, which 
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represents our expectation of what fraction of objects are really stars. For low- 2 : classification 
this was set to 98% (that is we expect 98% of the objects in the test set to be “stars”). For 
the mid -2 classification it was set to 99.9%, reflecting the lower density of quasars in this 
redshift range as compared to lower redshift. Finally for high- 2 ; classification, it was set to 
99.99%. These numbers are estimated from the ratio of the number of objects in the test 
set to the number of objects in the training set, which provides a conservative estimate of 
the quasar fraction. These star priors demonstrate the level to which quasar classification 
is a “needle in a haystack” problem that requires methods more sophisticated than simple 
color cuts. Note that small changes in the prior only make small changes in the number 
of quasars selected. For example, in the low -2 case, lowering the stellar prior by 1% does 
not increase the number of quasar candidates by 1% of the test set (roughly a half million 
objects); rather we find that it changes the number by roughly 1% of the quasar candidates 
(~ 7000 objects). 


3.3. Classification Results 


Here we present the results of our classifica 
(optical plus MIR colors) se lection described in 


described in more detail by iRichards et al.l (12004 l2009al) . 


ion. This proc e ss is an extension of the 8-D 


Ri chards et al. (l2009bl ). using the algorithm 


Our algorithm can roughly be summarized as choosing objects for which 
P(colors|quasar)P(quasar) > P(colors|star)P(star), 


( 2 ) 


where P(star) is the stellar prior, P(quasar) is 1—P(star) and P(colors(quasar) is the proba¬ 
bility of an object having certain colors given that it is known to be a quasar (and similarly 
for the stars). In p ractice we h a ve pe rform ed this classi fi cation in a discrete binary fashion 
using kd trees; see iGrav et al.l (120051) and iRiegel et al.l (120081) . However, we compute the 
continuous probabilities for all of the objects that satisfy the initial binary selection crite¬ 
rion and we report those values in the final catalog as they can sometimes be useful in post 
assessment of the classification accuracy. 


This process identified 1,317,677 objects as low -2 quasar candidates, 804,342 as mid -2 
quasar candidates, and 48,324 objects as high -2 quasar candidates. These amount to 2.6%, 
1.6% and 0.1% of the test set objects. These percentages are larger than expected from the 
priors; however, these include contaminants that we have worked to remove using some cuts 
as described below and the algorithm is not strongly sensitive to differences at this level. 


We have reduced the amount of contamination from stars and galaxies by restricting 
our analysis to objects classified as point sources in the SDSS photometry and by requiring 
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that all the candidates lie to the right (redward) of both of the following two cuts: 

([4.5] < 16.0&&[3.6] — [4.5] < -0.1)||([4.5] > 16.0&&[3.6] - [4.5] < ([4.5]-15.2)/-8.0)) (3) 

(i < 16.5&&[3.6] - [4.5] < —0.1)||(i > 16.5&&[3.6] - [4.5] < (i - 15.7)/ - 8.0). (4) 

We further restrict our candidates to objects more than 15 degrees from the Galactic plane 
and that have less than 1 magnitude of extinction in the w-band, A u < 1.0 (A, < 0.4). 


After these cuts we are left with 885,503 quasar candidates, including 748,839 low -2 
candidates, 205,060 mid -z candidates, and 13,060 high -2 candidates, where the totals do not 
match due to objects being selected in more than one redshift range. These numbers can 
be contrasted with th e 554 6 q uasar candidates from our previous attempt at optical+MIR 
classification ([Richards et al. 2009b). Four of the mid -2 objects and five of the low -2 objects 
are duplicates that result from matching of multiple IR sources to the same optical source; 
we have not resolved these duplicates into a single object in the interest of completeness. 


Figures El and El mimic Figures El and [HI but here we plot the quasar candi dates rat her 


than t he known quasars. Comparison of t hese distrib u tions to the cuts used bv IStern et al. 


( 2012 1 (solid black lines in Fig. [9]) and Assef et al. J 2 OI 3 I) (solid yellow lines in Fig. [9]) 
demonstrates the improvement of our method to type-1 quasars (particularly those that are 
faint with red optical colors) over using MIR color-magnitude cuts alone. While this vastly 
increases the number of high -2 quasar candidates, it does come at the cost of excluding type 
2 quasar candidates. 


4. The Catalog 

Our catalog is presented in Table [2] Of the 885,503 quasar candidates, 733,713 lack 
spectroscopic confirmation (and 305,623 are objects that we have not previously classified 
as photometric quasar candidates). We find that 150,453 objects are already known to be 
quasars. This was determined by matching the candidates not only to the known quasars 
in the master quasar catalog that defined our training set but also to the full SDSS-I/II/III 
spectroscopic database. Only 743 candidates (< 0.1%) have been classified as stars. A total 
of 589 objects are classified as galaxies, however, 175 of those have 2 > 0.5 and thus are 
likely to be AGNs. Indeed many of the objects classified as 2 > 0.5 galaxies appear in 
the hand-vetted SDSS quasar catalogs; this reflects the sensitivity of our method to low- 
luminosity AGNs in compact galaxies. The confirmed stars and galaxies do not occupy any 
unique parameter space that would allow them to be easily distinguished as contaminants. 
Overall, the candidate list appears to be quite robust and visual inspection of the optical 
SDSS images confirms this impression. 
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Fig. 8.— Optical colors of test set objects selected as quasar candidates. Contours/points 
and colors are as in Figure 0 high-z quasars in black, niid-z quasars in cyan, and low-z 
quasars in blue. Training set “stars” are shown in red (for point sources) and gray (for 
extended sources). 
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Fig. 9.— Optical colors of test set objects selected as quasar candidates. Contours/points 
and colors are as in Figures [6] and HJ The dashed black line shows the detection limit as a 
function of color for a theoretical ob ject w ith [3.6] =20.5. The solid black lines indicate the 
color and magnitude limits of the [Stem et ah (2012) selection in AB magn itude space, while 
the yellow curve gives the 75% reliability selection from Asse f et ah (120131) . The green lines 
in the bottom panels give our own cuts, as defined in Equations 3 and 4, that are intended 
to reduce stellar contamination. 















Table 2. Optical+MIR Photometric Quasar Catalog 


Column 

Name 

Description 

i 

RA 

Right Ascension (J2000) 

2 

DEC 

Declination (J2000) 

3 

CLASS 

Spectral classifcation (QSO, GALAXY, STAR, CELG, ??, or U) 

4 

ZSPEC 

Spectroscopic redshift (if known) 

5 

U.MAG 

SDSS 22 -band AB magnitude, corrected for Galactic extinction 

6 

G.MAG 

SDSS g-band AB magnitude, corrected for Galactic extinction 

7 

R.MAG 

SDSS r-band AB magnitude, corrected for Galactic extinction 

8 

I.MAG 

SDSS z-band AB magnitude, corrected for Galactic extinction 

9 

Z.MAG 

SDSS z-band AB magnitude, corrected for Galactic extinction 

10 

CH1.MAG 

3.6 micron AB magnitude, corrected for Galactic extinction 

11 

CH2.MAG 

4.5 micron AB magnitude, corrected for Galactic extinction 

12 

U_MAG_ERR 

Error on 22 -band magnitude 

13 

G-MAG-ERR 

Error on g-band magnitude 

14 

R-MAG-ERR 

Error on r-band magnitude 

15 

I-MAG-ERR 

Error on 2 -band magnitude 

16 

Z.MAG.ERR 

Error on z-band magnitude 

17 

CH1_MAG_ERR 

Error on 3.6 micron magnitude 

18 

CH2.MAG.ERR 

Error on 4.5 micron magnitude 

19 

U.FLUX 

SDSS 22 -band flux density in nanomaggies 

20 

G.FLUX 

SDSS g-band flux density in nanomaggies 

21 

R.FLUX 

SDSS r-band flux density in nanomaggies 

22 

I.FLUX 

SDSS 2 -band flux density in nanomaggies 

23 

Z.FLUX 

SDSS z-band flux density in nanomaggies 

24 

CH1.FLUX 

3.6 micron flux density in microJy 

25 

CH2.FLUX 

4.5 micron flux density in microJy 

26 

U.FLUX.ERR 

Error in 22 -band flux density 

27 

G.FLUX.ERR 

Error in g-band flux density 

28 

R_FLUX_ERR 

Error in r-band flux density 

29 

I.FLUX.ERR 

Error in 2 -band flux density 

30 

Z.FLUX.ERR 

Error in z-band flux density 

31 

CH1.FLUX.ERR 

Error in 3.6 micron flux density 

32 

CH2.FLUX.ERR 

Error in 4.5 micron flux density 

33 

YAPERMAG3 

Y-band Vega magnitude from UKIDSS or VHS 

34 

JAPERMAG3 

J-band Vega magnitude from UKIDSS or VHS 

35 

HAPERMAG3 

H -band Vega magnitude from UKIDSS or VHS 

36 

KSAPERMAG3 

K-band Vega magnitude from UKIDSS or VHS 

37 

YAPERMAG3ERR 

Error in Y-band magnitude 

38 

J APERM AG3ERR 

Error in J-band magnitude 

39 

HAPERMAG3ERR 

Error in Jf-band magnitude 

40 

KSAPERMAG3ERR 

Error in If-band magnitude 

41 

FUV.MAG 

GALEX FUV magnitude (AB) 

42 

FUV-MAG-ERR 

GALEX NUV magnitude (AB) 

43 

NUV.MAG 

Error in FUV magnitude 

44 

NUV.MAG.ERR 

Error in NUV magnitude 

45 

GI.SIGMA 

Indicator of distance from mean g-i color at ZHOTBEST 
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Tabic 2—Continued 


Column Name Description 

46 EXTINCTU Extinction in SDSS u-band 

47 STAR.DENS Star Density from KDE algorithm 

48 QSO-DENS Quasar Density from KDE algorithm 

49 ZPHOTMIN Minimum photometric redshift (ugriz) 

50 ZPHOTBEST Best photometric redshift (ugriz) 

51 ZPHOTMAX Maximum photometric redshift (ugriz) 

52 ZPHOTPROB Probability of ZPHOTBEST being between min and max 

53 ZPHOTMINJHK Minimum photometric redshift (ugrizJHK) 

54 ZPHOTBESTJHK Best photometric redshift (ugrizJHK) 

55 ZPHOTMAXJHK Maximum photometric redshift (ugrizJHK) 

56 ZPHOTPROBJHK Probability of ZPHOTBESTJHK being between min and max 

57 LEGACY Indicates if object is in the SDSS Legacy footprint 

58 SDSS-UNIFORM Indicates if object was selected according to Richards et al. (2002) 

59 PRIMTARGET SDSS primary target selection flag; see Richards et al. (2002) 

60 PM Proper motion in milliarcseconds per year 

61 DUPBIT Bitwise flag indicating low -2 (2°), mid -2 (2 1 ), and high -2 (2 2 ) sources 
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The columns in the catalog are as follows. 1) RA (degrees; J2000), 2) Declination 
(degrees; J2000), 3) the classification of the object from matching to known objects (QSO, 
STAR, GALAXY, CELG, and “??” j^| based on existing spectroscopy, or “U” for unknown if 
we know of no spectroscopy for the source, and 4) the known redshift. Columns 5-11 give the 
ugriz optical AB (asinh) magnitudes (corrected for Galactic extinction) along with the [3.6] 
and [4.5] mid-IR AB magnitudes (also corrected for Galactic extinction). Columns 12-18 give 
the errors on these magnitudes. Columns 19-32 give the SDSS-1I1, WISE , and Spitzer flux 
densities and errors where the optical values are measured in nanomaggies (as reported by 
the SDSS data sweeps hie that we use) and the mid-IR values have been converted to /iJy; no 
extinction correction is applied to these values. Columns 33-40 give the YJHK magnitudes 
and errors from the UKIDSS or VHS surveys (where available). Columns 41-44 give the 
far-UV and near-UV magnitudes and errors from GALEX (where available); no Galactic ex¬ 
tinction corrections have been applied. Column 45 indicates whether the g — i color is within 
la (0.68), 2cr (0.95), or 3cr (0.99) of the mean color for quasars at the predicted redshift. 
Outliers are an indication of either bad photometri c red shifts or non-quasar contaminants. 
Column 46 is the w-band extinction from Schlahv & Finkbeiner (201111; extinctions in other 
wavebands can be derived from this value. Columns 47 and 48 are the star and quasar 
probabilities as determined by the kernel density estimation used in our primary selection 
criterion. Columns 49-52 use the optical and MIR photometry to tabulate the minimum, 
best, and maximum photometric redshift along with the probability of being between the 
minimum and maximum values as described in more detail in Section |4~T1 Columns 53-56 are 
the same photometric redshift values but now adding JHK photometry to the optical and 
MIR. Column 57 indicates whether the object is within the “legacy” SDSS footprint, which 
is useful for statistical analysis. Column 58 indicates whether the obj ects was i n the u niform 
targeting area for the quasar target selection algorithm described in Ri cha r ds et al.l (120021). 


Richards et ah 


In Column 59 we gi ve t he flag (if set) from SDSS-DR7 quasar targeting, where 
(2002) and Schne i de r et al. 02010 ) provide details on the values of these flags—which can 
be used as a secondary indicator of quasar likelihood. Column 60 gives the proper motion 
(PM) in mas per year in a similar manner as discussed in Richards et al. (]2009af) . based on 
Munn et al.l 020041) and can also be used as a secondary indicator of quasar likelihood. Fi¬ 


nally, column 61 is a bit-wise flag that indicates whether the object was selected as a low -2 
(DUPBIT & 2°), mid- 2 (DUPBIT & 2 1 ), or high -2 (DUPBIT & 2 2 ) source (or a combination 
thereof). 


13 See Section [5731 for an explanation of the “CELG” and “??” classifications. 
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4.1. Photometric Redshifts 


We have used the ph otometric-redshift algorithm described by iRichards et al.l (120011) 
and Weinstein et ah (120041 ). extending it to include the rnid-IR photometry from Spitzer and 
WISE , and, in some cases, the near-IR photometry from VHS and UKIDSS. In short, this 
algorithm seeks to minimize the distance between the colors of an unknown source and the 
mean colors as s hown in Figu re HI For luminous quasars this method is superior to template 


fitting (e.g., Asset e t ah 2010J, Fig. 10) as it primarily picks up on the high-equivalent-width 


emission line features rather than spectral breaks (although at high -z the Lya break leads to 
improved photometric redshifts even with our metho d). Careful selection of templates can 
lead to improved results as shown by Salvato et ajJ (120091) —particularly for host-dominated 
AGNs. 


Figure H0l shows the photometric vs. spectroscopic redshifts for all three samples. Note 
that there is some overlap between the samples (as designed to ensure that objects with 
redshifts near the edges of the training set redshift windows are not lost). The left panel 
reveals where there are photometric redshift degeneracies in the sample; however, the right 
panel shows that the vast majority have well-estimated photometric redshifts and that catas¬ 
trophic outliers are a minority. We find that 90.9%, 82.7% and 85.7% of known quasars have 
photometric redshifts within Sz = ±0.3, for high- 2 , rnid- 2 , and low -2 candidates, respec¬ 
tively. Candidates can be restricted to more robust photometric redshifts by making a cut 
on ZPH0TPR0B which gives the probability that the true redshift is between the minimum 
and maximum reported values. 


It is not our goal herein to rigorously investigate the nature of the degeneracies in 
Figure [TQ1 However, as one example, we consider the degeneracy between 2 ~ 0.75 and 
2 ~ 2.25. Here the Lyman-a forest is not yet strong enough in u to overcome similarities 
between the general optical/UV and MIR spectral slopes, Mg II vs. C IV in g, H/3 vs. Mg II 
in 2 , and Pa a vs. Pay in [3.6]. JHK data can break that degeneracy as J — K spans the 1 /jrn 
transition between the optical and IR at low redshift while it samples the optical slope at high 
redshift. We specifically find that adding JHK data improves the overall photo -2 accuracy 
to 93% (virtually eliminating catastrophic errors). However, near-IR data o f sufficient depth 
are only available over a fraction of the area surveyed; Euclid data (Laureijs et ah 2012) will 
be very welcome in this regard. 


Another way we can determine the photometric redshift accuracy is to look at the 
color-redshift relation using the photometric redshifts of our objects. Figure [TT] shows the 
distribution of g—i color versus photometric redshift for our candidates. Photometric redshift 
degeneracies can produce semi-discrete features where one redshift is preferentially selected. 
Objects where the g — i color is within the 99% confidence limit at the best-fit photometric 
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Fig. 10.— (Left:) Photometric vs. spectroscopic redshift for all 3 samples; blue: low- z, green: 
mid-;?, red: high-z. As this presentation highlights the catastrophic outliers at the expense 
of the well-determined photometric redshifts we also present a histogram of the differences 
between the spectroscopic and photometric redshifts in the right panel. This shows that 
most objects have well-estimated redshifts. 

redshift (GI_SIGMA) are highlighted in gray. These objects are likely to be the most robust 
candidates and are expected to have the most accurate photometric redshifts. Objects out¬ 
side of this 99% confidence interval are likely to be contaminants, have erroneous photometric 
redshifts or have interesting spectral features (highly dust reddened, broad absorption lines, 
etc.). For example, the objects with colors bluer than the mean g — i color at photometric 
redshifts of z ~ 4.8 and z ~ 5.5 are unlikely to be at those redshifts. However, they may 
well be quasars at z ~ 4-4.5. Alternatively, if they are indeed quasars, they could be at 
much lower redshift but have dust reddening or absorption troughs that make them appear 
like higher redshift quasars. 


5. Analysis 

5.1. Comparison of Selection Methods 


An advantage of our selection method is that it can take full advantage of data from 
Spitzer during the post-cryogen exploration phase of the mission. In such cases, we only have 
3.6/im and 4.5/an measurement s. This keeps us from being able to p e rform “wedge” selection 
that has proven so successful fjLacv et al.l l2004al: IStern et al.l 120051.120071) because WISE is 
not deep enough in W 3 and W 4 relative to our optical data. However, our method allows us 
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Redshift 


Fig. 11.— g — i color vs. photometric redshift for our quasar candidates. Black (linear) 
contours and dots are all the candidates; gray contours and dots represent objects that have 
colors that are within the 99% confidence limits of the mean quasar color-redshift relation 
(red line). Outliers may be contaminants or have erroneous photometric redshifts. 


to probe to much fainter IR limits using only 2 bands since we also have match ed o ptical 


photometry . This process enables us to improve upon MIR-only selection flStern et al. 


2012 


Ass e f et ah 201311 (at least within the SDSS footprint) by helping to remove the MIR bias 
against 3.5 < z < 5.0 quasars. In a similar vein, our method is po tentially m ore co mplete 
and more efficient at faint magnitudes than variability selection (e.g., Butler & Bloom 2 0111 ). 
where the optical photometry in any single epoch of imaging is noisy. While the power-law 
method used by Donley et ah (2012) results in more reliable MIR classification, quasars are 
not necessarily power-laws in the MIR (and are not always monotonic), so that method is 
more incomplete than that presented herein with regard to those objects that do not fit a 
power-law template (to within the errors). 


We note that the Bo vv e t a h (2011) algorithm should perform similarly well as ours 
if it were rebuilt to include the stellar locus in the MIR (as opposed to applying color 
cu ts before/after running the optical selection algorithm). One utility of the Bovv et ah 


(20111) method is that meaningful probabilities can be easily and rapidly built on a per- 
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object basis. This allows for the alternative approach of constructing a fully probabilistic or 
extremely complete catalog, which is a less appropriate catalog to use for direct statistical 
analyses but which can be used to, e.g., match low-probability objects in the optical+MIR 
to AGN selected at other wavelengths (see DiPompeo et ah, 2015, in preparation, for just 
such a catalog). Alternatively, the catalog we have built is deliberately efficient (or “pure”) 
and therefore more appropriate for statistical analyses given good characterization of the 
incompleteness. 


An obvious question is what our method has gained over making simple color cuts. We 
illustrate this with two ex amples of MIR-only cuts and a cut involving both optical and 
MIR data. Richards et al. (2009a. Figure 10) illustrates the trade-off between completeness 
and contamination for a simple [3.6] — [4.5] color-cut. The standard W\ — W 2 > 0.8 cut, 
which equates to [3.6] — [4.5] > 0.119 as discussed above would recover 80% of the quasar 
candidates compiled herein, with most of the losses being high-redshift candidates. The 
total number of test set objects passing such a cut is 1.85M. If all of our candidates were 
quasars and all of the remaining objects within those 1.85M were contaminants, then the 
contamination of such a cut would be 60%. Restricting just to point sources leaves only 
1M targets, but that still would represent a contamination of 30%. Thus such a cut would 
neither be optimally complete or efficient. If we wanted a more complete quasar set, a better 
cut would be [3.6] — [4.5] > —0.1, which achieves 95% completeness to our quasar candidates. 
However, it obviously comes with significantly greater contamination: 86% overall and 55% 
for point sources. 


Better yet would be to combine the optical and MIR color information as we have in 
our KDE selection. A number of combinations are possible, but a simple cut of i — [4.5] > 
(g — i) — 1 recovers 99% of our candidates. With that comes nearly 95% contamination as 
more than 18M other objects are also selected by this cut. Most of that contamination is 
from normal galaxies as restricting to point sources reduces the contamination to 50%. A 
more restrictive cut to reduce the contamination is possible, but not without a commensurate 
reduction in completeness. 


5.2. Creating Robust Subsamples 

In order to further compare our candidates and selection algorithm to others, it is helpful 
to first identify the most robust subsamples possible. To that end we consider the effects of 
star-galaxy separation, previous SDSS targeting flags, proper motion, and the presence of 
GALEX detections. 
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Particularly at high -2 the robustness of our candidates depends on SDSS star-galaxy 


separation (as we might expect high -2 quasars to 
classification is thought to be 95% correct at r ~ 21 flAnnis et al 


je po in t sou r ces) . The morphological 
20 a , where this h as been 


explored in more detail by Scranton et al. (120021 ). Figure 1 of Scranton et al. (120021 ) shows 
that, as S/N degrades, galaxies are more likely to have small concentration indices and thus 
be classified as stars. “Point” sources fainter than 22nd mag have a significant probability of 
being galaxies; in poor seeing it is closer to 21st mag. As such, we do not consider any i > 22 
sources to be robust high -2 candidates (in the absence of other confirming information) and 
sources with 21 < i < 22 deserve some caution. 


In the case of relatively bright sources, the Richards et ah] ([2002) SDSS quasar target 
selection flags can be used to identify candidates that are particularly likely (or unlikely). 
As such, we have included those target flags (in the field PRIMTARGET) for sources where 
the SDSS-DR7 flag value was non-zero. Objects flagged as QSO_FAINT (PRIMTARGET & 
0x02000000) are sources that otherwise met the SDSS-DR7 selection criteria, but were just 
below the flux limit for spectroscopic follow-up. On the other hand, objects flagged as 
QSO_REJECT (PRIMTARGET & 0x20000000) are in regions of color space known for high 
contamination. Based on the known quasars and the color cuts that defined this flag, objects 
with this flag set that do not have 2 p hot ~ 2.4 are likely to be less robust candidates. 


In iRichards et al.l (j2009al) w e were able to re move some contaminants by identifying 
objects with high proper motions ( Mnnn et a,l. 2004) and we have included the proper motion 
for those objects with quality proper motion measurements (having small errors and at 
least 6 epochs of data ; see the discussion in IRichards et al.l l2009aT) . Using same cuts as 
Richards et al.l (12009a )) removes 160 known quasars which is just 0.25% of the quasars with 
quality proper motion measurements, yet it cuts 59 of the 280 (21.1%) of the known stars. 
These criteria further cut 478 unknown objects (0.73%) as compared to the 163 expected if 
all of those obje cts were quasar s. Overall, we find that many fewer objects have large proper 
motion than in Richards et al. (2009a). which we attribute to the current catalog being less 
contaminated by stars. 

We have not used UV data from GALEX in our selection or photometric redshift anal¬ 
ysis, but we have further matched our catalog to GALEX data in order to identify contam¬ 
inants and redshift errors. Specifically we matched our candidate quasars to both the MIS 
and AIS GALEX catalogs as compiled by Bi an chi et ah] (120051 1. excluding sources with an 
near-UV (NUV) artifact flag. We then tabulate the NUV and FUV (far-UV) magnitudes 
(AB) in addition to their errors. This matching can be used to weed out low -2 interlopers 
from among the high -2 candidates. Specifically, real high -2 quasars are relatively unlikely 
to be GALEX sources (particularly fainter sources). Alternatively, lower-redshift sources 
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that we have misclassified as high- 2 : quasar candidates are much more likely to be detected 
by GALEX in the UV. We find that 101 of 9283 (1.1%) known quasars in our sample with 
z spec > 3 are detected by GALEX , as compared to 313 of the 9547 objects (3.3%) with 
^phot > 3, but that have low probability (< 0.8) of being at z > 3. Tims a GALEX detection 
for a high -z candidate suggests that the candidate may not be robust. 

The end result of these investigations is the addition of a number of parameters to our 
catalog that can be used to identify the most robust candidates. For our purposes, we will 
formally define “robust” candidates as those having ZPH0TPR0B > 0.8 and af>s (GI _SIGMA) <= 
0.95. There are 517586 candidates satisfying these criteria. Of those only 717 (0.14%) are 
known non-quasars, whereas 114120 are known quasars. 

For high-z candidates (3.5 < z < 5) we further restrict the most robust sources to non¬ 
detections in GALEX and i < 22. There are 10955 such sources, of which 7874 are unknown; 
6779 of these have not been previously identified by us as photometric quasar candidates. 
Only 79 are non-quasar contaminants, while 2890 of the 3002 known quasars (96%) indeed 
have z > 3. 


5.3. COSMOS and Bootes 


One way to judge the utility of this catalog is to compare it to areas for which there 


is particularly dense spectroscopy. One such example is the COSMOS field (ISanders et al. 


2007l > j. In addition to the COSMOS spectroscopy discussed in Section [Til we also compared 


to 


Prescott et ah (120061 ). which further identifies objects in the COSMOS field. We recover 


75 of the 95 quasars cataloged by them. Thirteen of these 75 were not identified as quasars 
in the master catalog and we have u pdated their classifications in our catalog. Only 3 of our 
objects match to galaxies from Pr escott et al.1 (12006 ) while no objects matched to stars. 


This comparison suggests that our catalog is relatively complete to known COSMOS 
quasars and has relatively little contamination. Yet our catalog has nearly as many new 
quasar candidates within the COSMOS held as have been confirmed by spectroscopy. Within 
the area bounded by the COSMOS Spitzer data, we find 547 quasar candidates in total. Of 
these 266 are known quasars, 3 are known galaxies, 1 is a known star, 32 are known compact 
emission line galaxies (CELG), 5 have spectra that are difficult to classify (given as “??” in 
the catalog), and 240 are unknown. CELG is a designation that we have chosen for those 
objects that are classified as narrow line in the COSMOS spectroscopy but generally show 
signs of being star forming galaxies rather than being AGN powered. They are all fainter than 
i = 21 and likely come into the sample due to a breakdown of SDSS star-galaxy separation 
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as noted above. Of the unknown objects, only 95 are robust candidates as described above 
(20 with z phot > 3.5 and i < 22). The lower-quality candidates have i ~ 22 and are at 
the limit of our selection method. Of the known quasars, only 5 have z > 3.5 and 47 have 
2.2 < z < 3.5. 

We can further compare our candidates to X-ray sources in the COSMOS fields The 
November 2011 update of the 53-field XMM-Newton data table analyzed in Brusa et ah 
020101) contains 2000 X-ray sources. There are 264 matches (to within 1") to our catalog, 28 
of which are unknown (16 robust). However, there are 176 additional unknown candidates 
(64 robust) from our catalog without X-ray matches that we deem within the X-ray footprint 
by virtue of there being an X-ray source within 240" (i.e., they are quasar candidates but 
were not detected in the X-ray). Of the robust candidates, 17 are z > 3.5 candidates with 
i < 22. Comparing the candidates matched to X-ray sources and those not matched we find 
that the average i-band magnitude of the matches is 20.62, while for the non-matches it is 
21.96. In terms of photometric redshift, the X-ray matches have a mean value of 1.28 as 
compared to 2.88 for the non-matches. 

Chandra data in COSMOS cover a slightly smaller region. Using the Chandra Source 
Catalog^ we find 934 X-ray sources of which 125 match to our candidates with 3 of those 
being objects without existing spectroscopy. However, there are another 125 of our quasars 
candidates within this X-ray footprint. 20 of those are robust unknown sources with 7 that 
are z > 3.5 candidates with i <22 (all of which are included in the XMM matching above). 
The average magnitude for these X-ray matches is % — 20.75 and for the non-matches is 
i = 21.54. The mean photometric redshift for matches is z = 1.24 and for non-matches is 
2.69. 


In principle, we could use morphology to further test the likelihood of the quasar classi¬ 
fication of our the candidates. However the SDSS star-galaxy separation becomes unreliable 
at a brighter li mit than our candidates. Although deep HST data are available in the COS¬ 
MOS area (jScoville et al. 2007a), it is not definitive. While the known bright quasars do 
tend to have point-like morphologies, the faint quasars (even at liigh-z) can be extended 
(host dominated) at the depth of the HST data. That said, any follow-up spectroscopy of 
COSMOS candidates should clearly consider the HST data for prioritization as 6 of the 12 
new high- 2 : candidates have stellar morphologies from HST (with the 5 non-matches to the 
HST data all being near the edges of the COSMOS field). 


If even a fraction of our mid- and high-z quasars candidates in the COSMOS area are 
real quasars, it would significantly increase the number of such objects. Compared to only 


14 http://cxc.harvard.edu/csc/ 
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5 known 0 > 3.5 quasars among our candidates, we saw above that there are 17 robust 
z > 3.5 candidates just within the X-ray footprint of the field, 6 of which have stellar HST 
morphologies—suggesting that the existing density of relatively bright high -z quasars in the 
COSMOS held is at least ~50% incomplete. 


The photometric redshifts for COSMOS sources presented by lSalvato et al. (2011) should 
be superior to ours and can be used to cross-check our results. However, only 36 of our can¬ 
didates match: 12 rnid- z and just 1 high-z, likely because of the restriction to X-ray sources 


m 


Salvat o et al. (2011). Of these, 17 have photometric redshifts that agree with ours to 


within ±0.3 (9 to ±0.1), including the high -2 candidate (COSMOS ID: 1980473) with a 
photometric redshifts of 3.295 vs. 3.329. 


Brusa et al. ([2009) report 40 z > 3 quasars in the COSMOS held (22 spectroscopic, 18 


photometric). We recover only 7 of those (all of which already appear in the master quasar 
catalog); however, this is not surprising as, of the 33 missing, 32 have i > 20.5 (the peak of 
our distribution) and 27 have i > 22, thus the Brusa et al. (120091 1 objects are much fainter 
than those cataloged herein. 


As with the COSMOS held, the Bootes held has also been sub ject to co nsiderable spec¬ 
troscopic exploration, primarily from the AGES program (IKochanek et ah 2012). Within a 
rectangular area defined by the minimum and maximum RA and Dec of th e deep S pitze r 
data taken as part of the Spitzer Deep, Wide-Field Survey (SDWFS; As hby et al. 2009), 
we find 1861 quasar candidates. Among these are 1085 confirmed quasars, 2 stars, and 
3 galaxies, leaving 771 unknown objects. However, the Spitzer data do not fully cover this 
space: there are 1738 candidates (of which 681 have no spectroscopic data) that are included 
within the approximate boundaries of the MIR data. Some of th ose o bjects fall outside of 
the boundaries of the AGES spectroscopic program (IK ochanek et al.l 2012, Figure 2), but 
nevertheless have the deep MIR data needed to perform robust MIR selection. 


Matching back to the AGES spectroscopy (to recover non-quasars not included in the 
training set), we End an additional 3 spectroscopically-confirmed stars and 36 spectra that 
resulted in unknown redshift/classification. A search of the NASA Extragalaetic Database 
for additional spectroscopic data revealed only one new object: FBQS J142607.7±340426 
that was not included in our master quasar catalog. 


Thus, as with the COSMOS field, the Bootes field also contains many new quasar can¬ 
didates, despite considerable efforts to confirm likely AGN. Of the 771 unknown candidates, 
we hnd that 294 are robust, with 46 being robust z > 3.5 candidates with i < 22. 


As a result of this analysis of quasar candidates in the COSMOS and Bootes fields, we 
conclude that there is a potential for significantly increasing the number of relatively-bright 
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high-*: (and mid-*:) quasars in that area of sky—despite considerable existing spectroscopic 
coverage of the held. The density of objects in these (and other Spitzer deep fields) is partic¬ 
ularly useful for absorption studies, making additional confirming spectroscopy worthwhile. 


5.4. Demographics 


One of our goals was to fill in the gaps at redshifts where optical-only qu asar selection has 
traditionally been incomplete. The SDSS selection algorithm (Richards et ah 2002) targets 
both low-redshift and high-redshift quasars to i < 19.1. To that limit the SDSS quasar 


sample is exp ected to be quite complet e at z < 2.2, wit 
and z ~ 3.5 ( Vanden Berk et ah 2005; Richards et al. 


i known incompleteness at z ~ 2.7 


2 0061 ; Wo rseck fe Proc h aska 20Dis¬ 


similarly the BOSS selection algorithm ( Ross et ah 2012a ) is limited to ~ 2.2 < z <~ 3.5 
and has known incompleteness at z ~ 2.9 flRoss et al.l 20 12bl). We would thus expect to find 
that our method would have little new to offer in terms of new bright quasars (i < 19.1) at 
z < 2.2, but may significantly improve quasar selection around z ~ 2.7 and z ~ 3.5. We 
might expect somewhat more new quasar candidates between 19.1 < i < 20.2 as SDSS did 
not target quasars at z < 3 fainter than % = 19.1 (reserving the fainter targets for z > 3 
candidates—targeted to i < 20.2) and BOSS did not explicitly target z < 2.2 quasars. 


In this light, we have matched our candidate list to the full master catalog (to determine 
which of these objects are new candidates), to the training sets (to determine the complete¬ 
ness with respect to the quasar training set), and to the full SDSS-III spectroscopic database 
(to identify known non-quasars). Figure fl2l compares the number of known spectroscopic 
quasars, our robust quasar candidates, and those robust candidates without existing spec¬ 
troscopy. Comparing the low-* quasars/candidates (blue lines) we find that there are some 
new quasars at i < 19.1 (the SDSS spectroscopic limit for 2 < 3), which may reflect our sen¬ 
sitivity to low-luminosity AGNs in compact galaxies. There are also hundreds of thousands 
of new low-* objects at fainter magnitudes. 


For mid-* selected quasars (2.2 < z < 3.5), Figure IT21 shows that our catalog provides 
relatively little in terms of new sources at i < 19.1 and i > 210. However, at intermediate 
magnitudes, the number of new candidate mid-* quasars is quite substantial. I 11 some sense 
this is surprising as the SDSS-III BOSS project was specifically designed to find quasars in 


15 Note that the thin solid lines in Figure 1T71 show the total number of known spectroscopic quasars, not 
the number of such objects that also have inid-IR photometry. Thus even if the candidate object number 
counts are below the spectroscopic counts, that does not necessarily indicate that we are incomplete to known 
quasars with MIR data. 
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this m agnitude a nd red shift range. At the same, it is known that BOSS is only ~60% com¬ 
plete fjRoss et al. 2012bj), so it is quite possible that we are simply turning up the remaining 
objects missed by BOSS. For high -z selected quasars (2 > 3.5), we again find relatively few 
new objects brighter than the SDSS spectroscopic limit (here i < 20.2), but there is a sig¬ 
nificant population of new candidates at fainter magnitudes—consistent with the difficulty 
that standard wedge-based IR selection of AGNs flLacv et al.l 2004a ; Stern et ah 2005 . 120071) 
have to recover objects at these redshifts. 


The expected redshift distribution of the robust new candidates is shown in Figure [T51 
We have computed the ratio of the photometric and spectroscopic redshift distributions for 
the spectroscopically confirmed quasars in our training set. This enables a rough correction 
of the photometric redshift distributions of our candidates to an expected spectroscopic red¬ 
shift distribution (shown in blue, green, and red for low- 2 , mid- 2 , and high -2 candidates, 
respectively). As noted above, the low -2 candidates are largely faint sources; they gener¬ 
ally have photometric redshifts consistent with their low -2 selection. The mid -2 candidates 
have a large range of photometric redshifts, which suggests photo -2 degeneracy and/or con¬ 
tamination. There are a large number of mid -2 candidates with photometric redshifts of 
2 ~ 2.7 and 2 ~ 3.5, which is encou raging as these are redshift regions where w e know that 
optical-only selection is incomplete (IRichards et al.l 120021; iRichards et al.l 120061) . The high -2 
candidates all have redshift estimates consistent with their selection, with a large number of 
new objects spanning 3.6 < 2 < 4.6. 


We find that most of the new candidates are at fainter magnitudes and/or come at 
redshifts where it is difficult to do optical-only, variability-only, or infrared-only selection of 
quasars. For example, many new candidates are at high-redshift which tend to be biased 
against by traditional mid-IR selection methods as noted above and also by variability selec¬ 
tion methods. Overall, there are 7874 robust high -2 quasar candidates. If all turned out to 
be quasars, this would more than double the number of such quasars in the SDSS footprint. 
Many of these candidates are very faint, but the distribution peaks at i ~ 20.5, likely reflect¬ 
ing the cutoff of i — 20.2 for high -2 quasars selection in SDSS-I/II. In the rnid -2 range there 
are 81,321 robust quasar candidates. At low -2 there are 424,448 robust quasar candidates. 
Most of these are quite faint, and despite the catalog’s limitation to point sources, those 
with 2 p hot < 1 are likely AGNs rather than luminous quasars. 


Many of these candidates are identified in our previous photometric quasars catalogs. 
However, a total of 87,24 2, 34,059, and 6779 low- 2 , mid- 2 , and h i gh -2 c andidates respectively 
do not already appear in Ri chards et ah| (J2009a) or Bo vv et ah (2011). 



























39 



Fig. 12.— Number counts of known quasars and robust quasar candidates as a function of 
magnitude and redshift range. Blue lines show the number of known quasars with z < 2.2 
(“spec”; thin), the number of low-* selected candidates (“cand”; dotted) and the number of 
low-^ selected candidates that lack spectroscopic confirmation (“new”; dashed). Similarly 
green and red lines give the number of 2.2 < z < 3.5 (or mid -z selected) and 3.5 < z < 5.5 
(or high -2 selected) quasars and quasar candidates. The green curves are scaled up by a 
factor of 2 and the red curves are scaled up by a factor of 6 in order to made the figure more 
legible. 


5.5. Number Counts/Luminosity Function 

A particularly useful test for a sample of photometric quasars is a comparison of their 
number counts to those of known quasars. Problems with efficiency/contamination will 
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Fig. 13.— Number of quasars as a function of redshift. The solid black line gives the 
spectroscopic redshift distribution of the quasars in our training set, while the dotted black 
line gives the photometric redshift distribution for those same sources. The ratio of these 
two is used to perform a first-order correction of the photometric redshift distribution of 
our candidates. Corrected photometric redshift distributions for the robust new candidates 
(spectroscopically-confirmed sources removed) are shown in blue for low-* candidates (scaled 
down by 3x to fit on the axis), green for mid- z, and red for high-*;. All histograms are scaled 
up by 10 x for z > 3.6 to better show the high-redshift distribution. 


show up as an excess of quasars (particularly at bright magnitudes), while problems with 
completeness will show up as a dearth of quasars. 


In Figure [14] we reproduce Figure 9 from Richards et al. (2009aj) which showed both the 
spectroscopic and photometric quasar number counts in two redshift ranges. Here we have 
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overplotted the number counts of our quasar candidates selected as low-z, rnid-z, and high -z 
candidates. 


In this hgure, open points represent the raw number counts, while the closed points 
give the completeness-corrected number counts. As we will do for the luminosity function 
analysis below, the objects going into the raw number counts presented here are limited to 
those in the SDSS “legacy area” (area = 10778.306 deg 2 ) and are classified as either quasars 
or unknown. The unknown sources are restricted to robust candidates as defined above 
in Section 15.21 The completeness corrections for this sample are given by the fraction of 
master quasar catalog objects recovered by our algorithm with these constraints as shown 
in Figure fT5l This analysis converts the raw counts to the total number of quasars expected 
(accounting for incompleteness of the selection algorithm, lack of mid-IR photometry, non- 
stellar morphology, and flag-rejection). 

We specifically find that the corrected low -2 number counts are a good match to the 
spectroscopic number counts at i ~ 17, being somewhat incomplete at i ~ 19 (but probing 
to i ~ 20.5), and exhibiting perhaps a factor of two contamination in the brightest bin 
shown. For mid- 2 ; quasars our sample appears to be filling in the gap in the SDSS selection 
over 19.1 < i < 20.2, while exhibiting less contamination than our previous photometric 
sample (as evidenced by a lack of a plateau at the bright end). The high-z number counts 
do not show any obvious sign of contamination from bright stars (once we have imposed the 
restrictions noted above). 


These number counts are thus consistent with our new catalog being both relatively 
complete (to within a deterministic correction) and efficient. If the efficiency was low (and 
thus the contamination was high), we would expect significant deviations from the slopes 
of the spectroscopic number counts. We see none _of the ex cess in our current photometric 
sample as we saw in the high- 2 ; sample from iRichards et al.l (j2009al) and the faint-en d counts 
are consistent with the optical+infrared selected candidates from Richards et al. (2009b, 
Fig. 12), which performed a selection similar to our current selection, but over a much 
smaller area of sky (~24deg 2 ). 

While our goal in this work was not to determine the luminosity function of quasars, 
but rather to take the next step in creating optimal photometric catalogs of quasars, it is 
nevertheless useful to examine the quasar luminosity function (QLF) as determined from our 
catalog. In Figure [16] we show the absolute magnitude (luminosity) an d photometric r e dshift 
distribution of our data using the same redshift and luminosity bins as IRic hards et al. 020061 ) 
and we compare the resulting luminosity function in these bins in Figure [171 We have taken 
the limiting magnitude to be i < 21 as shown, since that is where our completeness falls 
below 50% according to Figure [3] However, there is no single limiting magnitude for this 
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Fig. 14.— Quasar number counts as a function of redshift and i-band magnitude. Black and 
gr ay po ints, re spe ctively give th e spec troscopic and photometric number counts as reported 


m 


Richards et al. (e.g., Fig. 9 [2009a]); circles for z < 2.2 and triangles for 3 < z < 5. 


The open blue, green, and red squares give the raw number counts (with 1-cr Poisson error 
bars) for the candidates reported herein. The filled colored squares give the number counts 
corrected using Figure [T51 The mid-z and high -z samples bracket the redshift space of the 
old 3 < z < 5 sample, but show no sign of the contamination at the bright end (flattening 
of the number counts) seen in the old sample. 


investigation as we have simply matched all of the SDSS optical sources with MIR sources 
from WISE and Spitzer. Objects fainter than i — 21 can be included in the catalog if they 
are bright enough in the MIR, but they are excluded from our main QLF analysis. The 
gradient in the density of points near the z = 21 limit in Figure [16] might suggest that we are 
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Fig. 15.— Ratio of objects in the master catalog objects recovered by our algorithm to the 
master quasar catalog (prior to matching to mid-IR photometry). This corrects for objects 
too faint in the mid-IR to match the optical, objects rejected from the IR catalog due to 
flags, the exclusion of extended sources, and the incompleteness of the selection algorithm 
itself. The fraction is given as a function of i-band magnitude in the three redshift ranges 
we have explored (low- 2 : blue, mid- z: green, high- z: red). 

complete to deeper than this limit at low-*, but also that the completeness is at a somewhat 
brighter magnitude high- 2 . 

To produce the QLF results shown in Figure [T7] we restricted the catalog using the 
same cuts as above for the number counts, namely limiting to known quasars and “robust” 
unknown sources, both within the legacy area. In this presentation we make two corrections 
to the raw data. First, we correct for incompleteness as a function of f-band magnitude and 
redshift by weighting by the fraction of training set quasars recovered by our algorithm. Next 
we correct the photometric redshifts by weighting each object by the ratio of the number of 
spectroscopic redshifts to photometric redshifts for our training set quasars. That is, if there 
were really 100 spectroscopic training-set quasars at z = 1.45-1.55, but the photometric 
redshift estimates for those quasars placed 120 quasars in the same bin, then we would 
weight each new photometric quasar candidate in that photo -2 bin by 100/120. 
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Redshift 


Fig. 16.— Absolute magnitude (luminosity) vs. redshift for onr photometric quasar candi¬ 
dates with number of objects displayed as gray-scale hex bins. Teal lines indicate the bright 
limit of SDSS and the adopted limiting magnitude of our QLF analysis. The light gray grid 
lines delineate the bins used to compute the QLF in Figure [171 


We find reasonable agreement with the spectroscopic QLF points of Richards et al. 


(120061) given that the focus of this work was not the rigorous computation of the QLF. 
Specifically the black points in Figure[l7]are in good agreement with the SDSS points (gray) 
down to the flux limit of SDSS and appear to be well-behaved another magnitude deeper 
than the SDSS data. 


An exception is the deviation from Richards et al. (120061 ) seen in the z = 4.25 panel, 


where our photometric sample has a space density that is a factor of a few higher than SDSS 
at Mi(z = 2) ~ —27. This is likely to be caused either by contamination from non-quasars 
in our sample or under-correction of the SDSS completeness in this redshift range. If it is 
incompleteness, the origin may be a greate r sens itivity of our method to dust-reddened (but 


unobscured) quasars. Indeed, Lacy e t a l. (120151) . using an MIR-selected sample, similarly 


exhibits a steeper QLF slope than 

Ric 

Lards et al. 

( 

2006 

L and is more consistent with the 

results of 

Jiang et al. 

(2008), 

Ross et al. 

(2012m, and 

McGreer et al. 

(2013). At z = 4.75 the 
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Fig. 17.— Quasar luminosity function in 11 redshift bins. Filled black circles are photometric 
objects from onr catalogs brighter than the adopted limiting magnitude. Open circles are 
those where the limiting magnitude cuts though the (L, z) bin and thus have uncertain 
corrections (error bars are Poisson only), while the open triangles indi cate (uncor rected ) 
lower limits. Grey points are at the spectroscopic QLF values from (Richards et al. 2006), 
where the dashed grey line repeats the spectroscopic QLF from z — 2 in each redshift panel. 
In the z = 4.75 panel, we over plot the data (purple and teal) and best fit (dashed black 
line) from McGreer et ah (120131) . The photometric QLF matches the spectroscopic QLF 
quite well, especially considering that this was not one of the goals of this investigation. 
Th e excess den sity at z = 4.25 indicates either an under-correction of the completeness by 
Richards et al. ( 20061 ) or contamination in our sample—likely a combination of both. 


errors are somewhat larger, but the QLF is broadly consistent with 1 Me Greer et al. (120131) . 


5.6. Future 

One of the goals of this work is to set the stage for next-generation clustering inves¬ 
tigations using high-redshift quasars. The SDSS quasar sample lacks sufficient density to 
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test the lumin o sity d ependence of quasar clusteri ng flLidz et al.l 120061) such as proposed by 
Hopkins et al.l (12007 ). For example the work by Shen et ah (120071) used a sample of only 
~ 4000 quasars at 2.9 < z < 5.4 over ~ 4000 deg 2 . Here we cover more than double that 
area and nearly double the sample size, but over an even smaller redshift range. The various 
optical and MIR deep fields would enable the discovery of more objects by probing much 
deeper, but they are limited in their utility for high- 2 : clustering investigations by their small 
area and the MIR bias against high-z quasars. 

Substantial gains should come from pairing this method with the data coming from 
the SpIES project (Timlin, Ross, Richards et ah 2015), which has just completed tiling 


~ 125 deg 2 of the SDSS Stripe 82 region (e.g., Annis et al. 20141). We can estimate the 
number of high-z quasars in the SpIES area from the SWIRE ELAIS-N2 field (4.2 deg 2 ) 
which has the same depth as SpIES (but has not been covered by SERVS). In that held we 
find 32 high-z quasar candidates, 24 of which appear to have robust photometric redshifts. 
Thus we predict that SpIES will contain of order 5-7 high-z quasars per square degree or a 
total of 625-875 objects. This density should be sufficient for powerful tests of the clustering 
of quasars as a function of luminosity at high redshift. 

This work is further a proof of concept for future quasar surveys using both ground- and 
space-based data, s uch as could be done by combining phot ometric data from Pan-STARRS 


(IKai sgr et al. 2002) ( grizy ), Sky Mapper (Ke 


vey (The Dark Energy S urve y Collab oration 


to the WISE program jjAlainzer et al¬ 


ter et al.l 2007) ( uvgriz ), the Dark Energy Sur- 
2005j) ( grizY ), Hyper Suprime-C am@ ( grizy ), 


the Large Synoptic Survey Telesco p e ( Ivez i c et al.ll2008h ( ugrizy ), the NEOWISE extension 


2014) (using the two shortest WISE bandpasses), Eu¬ 


clid (iLaureiis et al.ll2012l) (Y JH ) or for future spectro scopic programs like the Dark Energy 
Spectroscopic Instrument (DESI; S c hle gel et ah 20.11). We have shown that using the com¬ 
bination of optical and MIR photometry is better (for unobscured quasars) than either data 
set alone and that there are considerable gains to be made from the use of modern statistical 
methods in performing multi-dimensional selection. 


6. Conclusions 

Using a proven kernel density estimation technique, we identify 885,503 type 1 quasar 
candidates within the imaging footprint of the Sloan Digital Sky Survey by combining the 
SDSS optical data with mid-IR imaging from WISE and Spitzer. Among these objects 
are 6779 robust, 3.5 < z < 5 quasar candidates that have no previous spectroscopic or 


16 http://www.naoj.org/Projects/HSC/surveyplan.html 
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photometric classification. This increase is possible due to incompleteness of MIR-only color 
selection in this redshift range and the difficulty of variability selection for faint, high-redshift 
quasars, and offers an opportunity to expand our exploration of the high-redshift universe. 

The optical and MIR color distributions shown in Figure 0 and |9] are good matches to 
the distributions of the training set quasars, but extend to fainter limits in both the optical 
and MIR. They also clearly demonstrate an increased completeness to high-redshift quasars 
(particularly at 3.5 < z < 5 where MIR color selection is incomplete due to spectral features 
pushing the colors of these objects bluer than typical MIR color-cuts). 

Photometric redshift estimates of these candidates using optical and MIR photometry 
are accurate to Az ± 0.3 at least 83% of the time, improving to 93% where there also 
exists near-IR photometry; see Figure [TUI Comparison with the known colors of objects at 
the expected redshift (Figure fTTIi can help to identify potential contaminants and/or those 
objects with erroneous photo- z. 

Our new candidates even include robust targets within the well-covered COSMOS and 
Bootes fields, where an increased density of spectroscopic quasars would aid in clustering 
and absorption line studies. This includes over 50 robust, new high -2 quasar candidates in 
both of the fields (where there exists deeper-than-average MIR photometry). 

Generally our algorithm is simply finding quasars that are fainter than the SDSS spec¬ 
troscopic limits, and that should not necessarily have received SDSS spectroscopic followup. 
However, there are a number of bright low -2 candidates without SDSS spectroscopy that are 
likely to be low-luminosity AGNs rather than luminous quasars. Figures fl2l and fl3l present 
the magnitude and expected redshift distributions of both the new candidates and the known 
quasars. 

We are able to explore the completeness and contamination of the method using number 
counts and luminosity function analysis. Figure [14] demonstrates that our algorithm is 
relatively complete to known low -2 quasars (accounting for our restriction to optical point 


sources) and shows no obvious sign of contamination from bright stars at anv ret 

shift. The 

QLF shown in Figure [T7] agrees well with the results from SDSS f 

Richards e 

al. 

2006), but 

suggest a steeper slope to the QLF at high -2 (consistent with 

Me Greer et al. 

2013j) and may 


be more sensitive to dust-reddened quasars. Future work will expand that presented herein 
by incorporating more information (variability, proper motion, etc.) and using survey data 
that probes deeper in the optical. 
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